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Abstract. In this paper we will discuss a procedure to improve the usual 
estimator of a linear functional of the unknown regression function in inverse 
nonparametric regression models. In Klaassen et al. (2001) it has been proved 
that this traditional estimator is not asymptotically efficient (in the sense of 
the Hajek - Le Cam convolution theorem) except, possibly, when the error 
distribution is normal. Since this estimator, however, is still root-n consistent 
a procedure in Bickel et al. (1993) applies to construct a modification which 
is asymptotically efficient. A self-contained proof of the asymptotic efficiency 
is included. 



1. Introduction 

In the nonparametric regression model with either direct or indirect observa- 
tions, both the parameter of actual interest (the unknown regression function) and 
the nuisance parameter (the error density that may also be unknown) are infinite 
dimensional. In this paper we consider the problem of asymptotically efficient (AE) 
estimation of a linear functional of the regression function. In other words, our aim 
is to construct estimators that are asymptotically normal with smallest possible 
variance. Linear functionals are of independent interest and often studied in the 
literature (Ibragimov & Hasminskii (1984), Goldenshluger & Pereverzev (2000)). 
They are also important because the Fourier coefficients of an expansion of the 
regression function in an orthonormal basis are linear functionals. Below we will 
briefly return to the latter aspect. 

Suitable estimators of the regression function naturally yield estimators of the 
linear functional by substitution in the inner product representing this functional. 
In van Rooij et al. (1999) it has been proved that, when a linear functional of 
an indirectly sampled density is to be estimated, substitution of a suitable density 
estimator produces an AE estimator of the functional. Even when the error density 
is known but arbitrary, the situation to be considered here, AE estimation of a 
linear functional of the regression function turns out to be an essentially harder 
problem. Klaassen et al. (2001) have shown that substitution of the usual type of 
regression function estimator does not produce an AE estimator of the functional, 
except possibly when the error distribution is normal. Another, somewhat simpler 
natural estimator could be easily proposed for linear functionals, but this estimator 
is equivalent with and in certain cases even identical to the plug-in estimator and 
hence not AE either. It follows in particular that the usual orthonormal series 
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type estimator of the regression function is obtained from estimators of the Fourier 
coefficients that are not AE in the above sense. Nevertheless such estimators do in 
general attain the best convergence rate for the mean integrated squared error. 

In this paper we will focus on the question of how to improve the plug-in esti- 
mator of a linear functional. The plug-in-method is a simple device to construct 
an estimator of just about any functional. We will see, moreover, that an estima- 
tor of the regression function will also be needed in the improvement procedure. 
Although not AE, the plug-in estimator is y/n - consistent. This means that the 
method to construct an AE estimator given a yfn - consistent one, described in 
great generality in Bickel et al. (1993) for infinite dimensional parameters, applies. 
See also Pfanzagl (1994) for improving sjn - consistent estimators in models with 
finite dimensional parameter. In principle our result would fit into the general 
theory as described in particular in Chapter 7 of Bickel et al. (1993). We prefer, 
however, to provide a self-contained and independent derivation in this paper. On 
the one hand the indirect regression model is sufficiently specific to allow explicit 
calculations, on the other hand it is of sufficient importance to warrant such an 
effort. The question whether employing improved estimators of the Fourier coeffi- 
cients in a series type estimator of the regression function itself would improve this 
estimator in a certain way might be of some interest. Although this cannot be true 
for the rate, we suspect that improvement will hold true at the level of constants. 
Investigating this matter is beyond the scope of this paper. In Section 6, however, 
we will briefly comment on this. 

In order to construct an AE estimator one needs to compute the efficient influence 
function. According to van der Vaart (1998) this is the projection of the gradient 
of the functional onto the tangent space to the model. See Section 3 for the details. 
Statement and proof that the improved estimator is AE can be found in Section 4. 
Indirect nonparametric regression occurs in a wide variety of practical situations, 
like WickselPs unfolding problem in stereology, geological prospecting, computer 
tomography in imaging, just to mention a few (O'Sullivan (1986), Kress (1989), 
Kirsch (1996)). Let us introduce another example (see also Section 6.1). 

Example. Can you see the weight of a cable? To answer this question, a 
paraphrase of the title of Kac's famous 1966 paper, let us first observe that the 
shape of a cable suspended at its endpoints with coordinates (0,0) and (1,0) is 
given by the differential equation 

(1-1) = /(*), < t < 1, 5 (0) = .9(1) = 0. 

Apart from the sign the source term / represents the load per horizontal distance 
and g the shape. The problem is to estimate a linear functional Jq f(t)ip(t) dt, 

for suitable ip, like for instance the total weig ht J* f(t) dt of the cable (<p = 1); 
for this special case, however, see the remark in Section 6.1. Estimation may be 
performed by first recovering the weight distribution or source term from the data 
(Ai, Yi), • • • , (X n , Y n ) that are independent copies of (A, Y), where 

(1.2) Y = g(X) + e. 

We will assume that the design variable X has the Uniform (0, 1) distribution and 
is independent of the error variable s. The latter has also zero mean, finite variance, 
and arbitrary density ip that is supposed to be known. 
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Using the Green's function the differential equation can be rewritten in the form 

(1.3) g(x) = [ K(x, t) /(*) dt = (Kf)(x), < x < 1, 

Jo 

where K : L 2 ([0, 1]) — > L 2 ([0, 1]) turns out to be compact and strictly positive 
Hermitian with Brownian bridge covariance kernel 

(1.4) K{x,t) = x At- xt, < x < 1, < t < 1. 
The operator has strictly positive eigenvalues 

(1.5) ^-{^Vry} 2 '^ ' 1 '-' 

with corresponding orthonormal basis of eigenfunctions 

(1.6) ip k (x) = V2sm(k + l)nx, < x < 1, k = 0, 1, • • • . 



The general theory below is tailored to but slightly more general than the com- 
pact case. This generalization ensures that the noncompact identity operator, which 
yields the usual direct regression, is included as a special case. 

2. The model, the problem, and assumptions 

Let (X, X, fi) and (Z,Z,^) be measurable spaces with L 2 (p) and L 2 (v) real 
separable Hilbcrt spaces. We are given a bounded, injective linear operator K : 
L 2 {v) — > L 2 (p). In the random design case, to be considered here, we observe a 
random sample {X\,Y\), • • • , (X n , Y n ) of independent copies of a random element 
{X,Y), where 

(2.1) Y = (Kf)(X) + e - g(X) + e, / e L 2 {v). 

The indirectly observed regression or input function / on (Z,Z) is unknown. We 
will assume that 

(2.2) = 1, and X = d Uniform (X). 

The error variable e is independent of the design X with known but arbitrary 
density ip with respect to Lebesgue measure. We will assume that 

(2.3) Ee = 0, Ee 2 = a 2 . 

Under these assumptions it is readily verified that (X, Y) has density 

(2.4) Pf (x, y) = V(y - (Kf)(x)), (x, y) e X x R, / e L 2 (v). 

A star attached to an operator denotes its adjoint. The operator R = (K*K)i : 
L 2 {v) — > L 2 (y), then, is strictly positive Hermitian. We will assume that there 
exists an orthonormal basis for L 2 (v) consisting of eigenfunctions ipo, tpi, ■ ■ ■ of the 
operator R with corresponding eigenvalues po,pi,-" > satisfying 

(2.5) sup fc > 0( o fc < oo. 

On the one hand compact operators R are included, since for those a basis exists 
with eigenvalues satisfying pt [ 0, as k — > oo (Dcbnath & Mikusinski (1999)). 
On the other hand the direct model with K = I and hence R = I, where I is 
the identity operator, is included as well. The operator I is not compact but 
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satisfies the condition above for any basis, with p k = 1 for all k. According to 
the polar decomposition (Riesz & Nagy (1990)) there exists a partial isometry 
V : L 2 (v) -> L 2 (p) such that K = VR and K* = RV*. It should be noted that 
V*V is the identity on the range of R. Let us write 

(2.6) (f V .k = Vifi k , 

and observe that the <pv,k are orthonormal in L 2 (p). 

The problem to be considered here is estimation of a linear functional / i— ► 
(f,ip),f € L 2 (v), for some given ip <E L 2 {v) with ||<p|| = 1. In view of (f,<p) 
= X)fc>o(/' VkXV) <Pk) h seems plausible that estimation of the special linear func- 
tional defined by the Fourier coefficients 

(2.7) fk = (f, <p k ) 

might suffice. This is in fact true under an extra condition, and some details can 
be found in Section 5. Hence we will focus on estimating an arbitrary Fourier 
coefficient. As a generic example let us consider the functional 

(2.8) /->/o = </>¥>o>,/eL 2 (i/) ) 
where ipo is the first basis element. It is useful to observe 

(2.9) h = (f,<p k ) = {f,K*VR-^ k ) = (K^VR- 1 ^) = -(Kf, Vv . k ). 

Pk 

The assumptions below will be briefly discussed in Section 6. The basis elements 
are supposed to satisfy the uniform boundedness conditions 

(2.10) sup zeZ k > a \(p k {z)\ < oo 
and 

(2.11) sup aeX)fc > |<py, fe (a:)| < oo. 

Regarding the input function / it will be assumed that there exists a sequence 
(m(n)) n >i satisfying 

Tfl 

(2.12) m = m(n) — > oo and —= — ► 0, as n — > oo, 
for which 

(2.13) fk -» 0, as n -» oo, 

k>m 

(2.14) ^ l/fcl -»0, asn^oo. 

k>m 

The error density -0 is supposed to be twice diffcrentiable, i.e., 

(2.15) tp" exists on R. 
Denoting the score function for location by 

(2.16) A = -£, 
we will also need that 

(2.17) A' and A exist and are bounded on R, 
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with finite Fisher information 

(2.18) EA 2 ( £ ) = EA'(e) = |°° (^j (y)^(y) dy = Ity) < oo. 

For the first equality in (2.18) we need (2.15). 

3. Construction of an asymptotically efficient estimator 

The input function / e L 2 {v) has the L 2 - expansion 

(3.1) f{z) = Y t fkVk{z), zez, 

fe>0 

in the orthonormal basis of eigenfunctions, and the usual estimator of the regression 
function in this context is given by 

(3.2) />) = f {m) (z) - Yl fWk{z), z e Z, 

k<m 

for suitable m = m(n) — > oo, as n — > oo, where A is the estimator of /fe (see (2.1) 
and (2.9)) given by 

n 

(3.3) A = -y2—Ynp v , k (Xi). 

n . =1 p fc 

See, for instance, Johnstone & Silverman (1990). Since we are estimating /o in 
(2.8), a plug-in estimator simply equals 

(3.4) {f,<Po)=k 
as given by (3.3) for k = 0. 

The estimators /^(fc = 0, 1, • • • ) have some desirable properties. Because of 

(3.5) E/ fc = -EF^,,(X) = — E{Kf)(X)<pv,k(X) 

Pk Pk 

= -(Kf,Vip k ) = -(f,RV*V<p k ) 
Pk Pk 

= — Pk(f, ¥>fc) = fk 
Pk 

they are unbiased, but / = /( TO ) is not; let us write 

(3.6) f {rn) (z) = Ef (m) (z) = J2 fk<Pk(z), z e Z. 

Furthermore we have 

(3.7) E(A - fk) 2 = Var A = - Var -Y^ v , k (X) 

n p k 

<i(i\\ Y ^ u x )<°- (- 2 

n \PkJ n \p k 

where < C < oo will throughout be used as a generic constant that will not 
depend on n or k and that here can be taken equal to 

(3-8) C = ( sup xeX , fe >ofe(z)l) • (ll^/ll 2 + A 

by assumption (2.11). 
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The central limit theorem yields at once the asymptotic normality of the empir- 
ical Fourier coefficients. In particular we have 

(3.9) Vn(fo - fo) -^d Normal (0, c7q(/)), asrs^oo, 
where (cf. (2.1), (2.3), and (2.6)) 

(3.10) a 2 (f)=(^j Var(F^,o(X)) 

> (J-^j E Var(Y<p Vfi (X)\X) 

Poy \Po, 

holds with a strict inequality, unless Var (E (Y(pv,o(X)\X)) vanishes, i.e., unless 
(Kf)(X)fv,o(X) is degenerate. 

It has already been observe d in Klaassen et al. (2001) that a%(f) is in general 
strictly larger than the optimal variance according to the Hajek - Le Cam con- 
volution theorem (van der Vaart (1998)). In other words the estimator / is not 
asymptotically efficient, but it is \fn - consistent. Such estimators can be improved. 
In order to do so we first need to briefly review some results from Klaassen et al. 
(2001). 

Since we assume tp to be known, in terms of square roots of the densities the 
model is S = {sf,f G L 2 (v)}, with Sf = ^/pj and pf as in (2.4). The tangent space 
at / e L 2 {u) to this model is given by 

(3.11) S f = [8f, h , hem K } = {*s f ,h, h e C L 2 (p x A), 

where A is Lebesgue measure on R, 91k is the range of K, %Kk its closure, and 

(3.12) m 8fth (x,y) = - n \~ ™f X)) H*), ^X,yeR. 

The gradient of the functional / T(sf) = (/, <^o) at / is given by (cf. (2.9)) 

(3.13) Tf{x,y) = —{y-(Kf){x))y Vfi {x)s } {x,y), x e X, y e K. 

Po 

Let Tf denote the projection in L 2 (/i x A) of Tf onto Sf. Then the optimal variance 
mentioned in the preceding paragraph equals 

(3-14) ^(f) = \\\T f \\ 2 . 

For these results see Klaassen et al. (2001). 

In order to construct an estimator with limiting normal distribution having the 
variance in (3.14) we first need to explicitly compute Tf. Because Tf has to belong 

to S f it is of the form 

(3.15) f f (x,y) = ' ~ { . Kf \ {x)) h(x), xgXjgl, 

2sf{x,y) 
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for some h G %Kk, and because of T f — Tj _L S / we must have (see also (3.12)) 
(3.16) 

(Tf - T f ,'s f , h ) =JJ {y Q (y {Kf){x))y Vfi (x)s f {x,y) + t^™^)l h(x)} 

x{ ny-(Kf)(x)) 

2s f {x,y) 

for all h e 5W. It follows from (2.3) and (2.15) that f^yip'iy) dy = -1 holds; 

cf. Lemma I.2.4.b of Hajek and Sidak (1967). Exploiting this fact and using the 
notation I(ip) defined in (2.18) straightforward integration shows that (3.16) entails 

(3.17) (I(ip)h - —ifivo, h) = 0, for all h G 

Po ' 

It is obvious that — fvo G 9\k- Since h G ^Kk it follows from (3.17) that 
Po ' 

4 

(3.18) h(x) = jj^-f V ,o{x), x G X. 
Combination with (3.14) yields 

(3.19) ~^ = \™ 2 = ^y 

for the value of the optimal variance. Note that 

2 



1 = (| °° yyj'(y) dy^j < a 2 I^) 



holds by the Cramer - Rao inequality with equality if and only if ip'(Y)/ip(Y) is 
linear in Y a.s. under ip, i.e. if and only if tp is a normal density. 

Consequently, it is immediate from (3.10) that the string of inequalities 

(3.20, ^ (/) >(l)% 2 >(l) 2 _l_^ S(/) 

holds. For nonnormal densities ip the second inequality is strict, and the estimator 
/o then turns out not to be asymptotically efficient. If ip is normal, we have I(ip) = 
l/cr 2 , so that the second inequality in (3.20) is an equality. However, asymptotic 
efficiency of /o remains impossible, since the first inequality in (3.20) cannot be an 
equality for all /, as argued in (3.10). See Klaassen et al. (2001) for some details. 

We are now in a position to construct the improved estimator. According to van 
der Vaart (1998) define the efficient influence function by 

3.21 T f {x,y)--— = -77-77-7 777777 77 — yv,o(s) 

2sf{x,y) I(ip) ip(y- (Kf)(x)) p 

1 1 

= T77T A (y _ ( K f)( x ))—<Pv,o( x )i x e X,y G R, 
1 VP) Po 

using the notation A introduced in (2.16). Following a procedure in Bickel et al. 

(1993, Chapter 7) let us now introduce the estimator 

1 " 1 

(3.22) f = /o + - E (KfM)(Xi))<Pv,o{Xi), 



8 



CHRIS A.J. KLAASSEN, EUN-JOO LEE, AND FRITS H. RUYMGAART 



where f( m ) = / is defined in (3.2). We will see in the next section that this turns 
out to be an asymptotically efficient estimator of fo- 

4. The main theorem 

Theorem. Suppose that all the assumptions listed in Section 2 are fulfilled. 
Then fo defined in (3.22) is an asymptotically efficient estimator of fo, i.e. 

(4.1) Vn(fo - fo) ->d Normal ^0, p 2j^ ) > as n ~> °°> 

where the variance in the normal distribution is optimal. 
Proof. A Taylor expansion yields 

(4.2) V^(/o - fo) = A n + Q n + R n 
with 

1 " 1 

(4.3) ^ = _g__ A(eiWi0(Xi)l 

(4.4) Qn = MU ~ f,Vo) 

-^f)^ K ^ (Kf)(X t )}A'(e l )^o(X i )l 

1 " 1 

(4.5) Rn = ^=J2 tJM {{KmXi) ~ (KNX'WWvvAXi)- 

In (4.5), el is a random variable between — (Kf)(X t ) = and Yi — (Kf)(Xi). 
It follows from (2.11) that 

(4.6) sup xex |^ (a;)| < oo. 

Together with (2.17) we find ( C generic ! ) 

„ n 

(4.7) \R n \ < -= "£{(Kf)(X t ) (Kf){ Xi )¥ 

v n i= i 

C ™ 

= X! ~ fk)Pk¥>V,k( X i) - X fkPk<fV,k(Xi)} 2 

c - " 

- 7^ XI X^ fc ~ ^ fe )^ ~ /OPkPK^Pv.kpQW.jpfi)} 

c ™ 

+ 7^EE /fc//PfePi{^ < ^v',fe(A l )^v, ; (x l )} 

k>m Z>m i— 1 

It clearly suffices to show that Ei?„j — ► 0, as n — > oo, for j = 1, 2, in order to obtain 
Rn — »p 0, as n — > oo. 
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Regarding the first term we see that 

ft n n n 

(4.8) ER nl = -575 £ EEEE^WW-A"*} 



fe<m Z<m ce=l /3=1 i=l 



x {y^^y,/^) - fipi}{ip Vik {Xi)ip V j(Xi)}]. 

By decomposing £ £ E a ,^,i = E E E a ^i + 

X] E Q =/3^i + E a =/3=i' an( ^ by realizing that the random variables labeled with a 
and /3 are centered at and that the ipy,k are orthonormal, we arrive at (S k i is 
Kronecker's delta) 



(4.9) < ER nl 
C 



< 
n 



2~7^ ^2{n(n-l)S kl Cov(Y<p V}k (X),Y VvJ (X)) 

k < m l<m 

+ nE{Y<p Vtk (X) - f kPk ){Yw,i{X) ~ fiPi)<Pv,k(X)<Pv,i(X)} 
k< m 

l/2l 



+ n 

k<m l<m 

_ / m m 2 \ 

< C — = H = — > 0, as n — > oo. 

V V n n V n / 

Here we have used (2.11) and (2.12). See also the calculation in (3.7). 
Next let us observe that 

(4.10) < Ei? n2 = 4= J2 E n -fkfiPkPiS k i 

Jn A — 4 f — 4 

k>7n l>m 



<cv^Yl ft A 



k>m 



< Cy/n ^2 fl -> 0, as n -» oo, 



by assumption (2.13). This settles the asymptotic negligibility of R n . 
For brevity let us introduce 

_ 1 " 1 
(4.11) U k = - V{(<A), Vk) F77TPfe^y,fc(^)A'(£i)^y,o(^)}, 

and note that U k is an average of i.i.d. random variables with zero mean. Indeed 
we have (cf.(2.18)) 

(4-12) (v>o,<Pk) ^-—p k Eip Vtk (X)A / {e)ipv.o( x ) 

= S k 7T-rPkI(ip)ho = 0, for all k. 

poi{i>) 
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Because of 

(4.13) Q n = v^Yl (/* - ■M TJk - ^ 2 f kVk 

k<m k>m 
= Qnl + Qn2, 

in order to show that Q n — > p 0, as n — > oo, it suffices to prove E|Q„i| — ► and 
E|Q n2 | - 0. 

In order to deal with Q nl let us first note that 

(4.14) n.h-fk) 2 u 2 k = 

^ n 71 71 71 

= ^iEEEE E{y i¥ v, fc (X i ) - p fe .ffc}{y^y,fe(^) - Pfc/fe} 

i=l j=l a=l /3=1 
X {(V0,</?fc)/Pfc - po /(-^) ^Kfc(^a)A , (£ a )l^y,o(^a)} 

x {(^o,</?fc)/pfc - p^ 1 ^) yv,fc(^)A / (£/?)yy,o(^)}- 

Since each factor has zero expectation the only terms that contribute are those 
with all 4 indices equal and those with 2 pairs of indices (but not all 4) equal. By 
(2.11), (2.17), and (3.7) the contribution of the first group is seen to be bounded 
by C ■ n and the second group is seen to yield a contribution bounded by C ■ n 2 . 
This entails 

(4.15) E(/ fc - hfU 2 k <-AC-n + C-n 2 )< C 



and hence, by applying Schwarz's inequality, 
(4.16) E|Q„i| < V^]T{E(/ fe -/ fe ) 2 t^} 1 



1/2 



k<m 



C m 
< \fnm— = C—= — > 0, as n — > oo, 
n sjn 



by assumption (2.12). 
For Q n2 observe that, 



(4.17) EIJ^ = - V Ve{(^ ,^) - —fi-p-pktpvAXiWMtPvAXi)} 

x {(<A>,</?fc) - -^^Pk<PV,k( X j) A '( £ j)'PVfi{ X 3)} 

<^) 2 E^ t (I)(Af( £ )^(I) 
n \Po/ 

C 2 
n fe 

This yields, again by the Schwarz inequality 

(4.18) E|Q„ 2 |<V^^ \fk\{WlY' 2 

k > m 



<Cy^ j \fk\pk -> 0, as n — ► oo, 



k>7n 



ASYMPTOTICALLY EFFICIENT ESTIMATION 



11 



by assumptions (2.5) and (2.14). 

Finally let us consider A n in (4.3). Since the terms are i.i.d. with 

(4.19) E_J_A (£Wo (X) = 0, 



(4 - 20) Var 7^F) Ai£)vMX) = 

i 



the central limit theorem entails at once that 

(4.21) A n —> d Normal ^0, ^J^pj^ > as n ^ oo. 

Because we have seen that R n — > p and Q n — » p 0, this is also the limiting distri- 
bution of the expression on the left in (4.2), as was to be shown. 



5. Estimating an arbitrary linear functional 

In this section we want to consider the problem of estimating the linear func- 



tional f v = (f,ip), for an arbitrary ip G L 2 {v) with \\ip\\ = 1. Since /< 



Sfe>o(/' <Pk)(<P, <Pk) = J2k>o /fe( < < 5 ' fk), hi view of the preceding results we expect 

(5-i) u = ^2fk(<p, <pk), 



fe>0 



to be an asymptotically efficient estimator. Before proceeding we need to introduce 
the extra condition that 



(5-2) E:rl<V' 



k>0 pk 



<Pk}\ < oo. 



This condition ensures that ip is in the range of (K 1 )*. Let us write 
(5.3) 7=(A-- 1 )V = ™- 1 (Eto^>w0 



fc>0 



—{<p, fk)(pv,k- 



k>0 pk 

Thanks to (2.10) and (5.2) the convergence in (5.3) is even pointwise. 

To verify the asymptotic efficiency let us first observe that the optimal variance 
in the normal component of regular sequences of estimators equals 

~2/^ _ INI 2 



(5-4) W = J W - 

This can be shown by virtually the same method as employed in Section 3 for ip ■ 
Writing the decomposition in (4.2) for general ifk (rather than for ip ) as 

(5.5) Vn(fk - fk) = A ntk + Q rhk + R Utk , 
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we have 

(5.6) Vn{fip - U) = y^(P' Vk)A n , k + y^(<P, <Pk)(Qn,k + Rn,k) 

k>0 k>0 

= SnS + S n> 2- 

As follows from (5.3), 

1 " 1 1 

(5.7) S n ,i = ^^j^A{ei)Y,—(<P><Pk)<Pv,k{Xi) 

which means that S n ,i is well defined. Exploiting the calculations in the proof of 
the Theorem in Section 4 it follows that S n ,2 is also well defined and that 

(5.8) ^2{<P,<Pk){Qn,k + Rn,k) = O p (l). 
fc>0 

Combination yields the following results. 

Theorem. Suppose that condition (5.2) is fulfilled in addition to the assump- 
tions in Section 2. Then f v is an asymptotically efficient estimator of f Vl i.e. 

s / || 7 || 2 \ 

(5.9) Vn{f v - U) -^d Normal I 0, Jj^yj J . as n ^ oo, 

6. Some comments on the conditions and 
improved regression estimation 

6.1. The conditions. In the example of Section 1 the operator K itself is 
Hcrmitian and hence V — I so that 

(6.1) <Pk{x) = fv,k{x) = \/2sin(fc + l)7ra;, < x < 1, k = 0, 1, • • • 

It is immediate that conditions (2.5), (2.10), and (2.11) are fulfilled. It should be 
noted, however, that the function ip = 1 on [0, 1] doesn't satisfy (5.2). If the total 
weight is to be estimated, one should therefore employ a sufficiently smooth (near 
and 1) approximation of this function. 

Next suppose that the input function / satisfies |(/, tpk}\ = \.fk\ ^ k~ s for some 
s > 2, and that m(n) x n r for some < r < \. Then we have 

(6.2) ^E^ = °( nl/2+rM) )' 

fe>m 

(6.3) Yl IM=°K (1 ~ S) )> 

k>rn 

and apparently conditions (2.13) and (2.14) are satisfied if 

1 1 + 2r 

(6.4) 0<r< 2' S> ~ir-- 
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The conditions on the error density also appear to be usually fulfilled. A non- 
normal density that satisfies conditions (2.15) - (2.18) is, for instance, the logistic 
density 

(6.5) V(z) = (1 + e -, )2 . x e R. 

In particular A' and A turn out to be bounded indeed. 

6.2. Improving regression estimation. It should be noted that for the 
present results it is irrelevant whether the input estimator /( m ) attains the optimal 
MISE rate. It is not hard to see, however, that conditions on m and / ensuring 
this rate to be optimal are in general compatible with those in (2.12) - (2.14). In 
this discussion, however, we will allow the truncation index M = M(n) of the 
traditional input estimator /(m) > that attains the optimal MISE rate, to tend to 
infinity at a different rate than the m = m{n) used above. 

The results in this paper regard the variances of the limiting normal distributions 
of the estimators, and not the variances or MSE's of these estimators themselves. 
It is clear, however, that 

(6.6) MSE(/ ) = Var/o = -<J 2 (f), 

n 

and we conjecture that 

(6.7) MSE(/o) = E(/ - /o) 2 = + o(-). 

This would imply that /o improves /o also with respect to the MSE (at the level of 

constants). We similarly expect each of the to improve regarding MSE, and 
eventually 

(6-8) f{M){y) = fwk{y),y£ Y, 

k<M 

to improve /(m) with respect to MISE (at the level of constants). The actual 
calculations leading to (6.7) will differ from those in Section 4 and might be lengthy. 

Moreover, if theoretically /(m) would improve fiM)i it would be interesting to 
perform simulations for several nonnormal error distributions, to get an insight 
into the difference of the performance for finite sample sizes. All this is beyond the 
scope of this paper. 
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